1 Introduction

In this workshop, we will explore efficient way of writing Nextflow scripts and learn about the basic folder structure that should be maintained in a nextflow pipeline.

3 Tutorial

We will cover the modularization and configuration of nextflow scripts.

3.1 What is modularization?

Modularization is basically dividing the whole nextflow script into smaller parts. It simplifies the writing of complex data analysis worflows and makes re-use of processes easier.

3.2 How can it be done?

It can be achieved by converting the workflow’s processes into modules, then call them within the workflow scope in a variety of ways.

3.3 Creating a nf-core pipeline

A new pipeline with a template of nf-core pipeline can be created using nf-core create command.

Example:

nf-core create -n testpipeline -d "Test pipeline" -a "Diya" --plain
Output:
bash
tree -d nf-core-testpipeline/
nf-core-testpipeline/ [error opening dir]

0 directories

3.4 Modules

Stand-alone module scripts can be included and shared across multiple workflows. Each module can contain its own process or workflow definition.

Example:
bash
cat bin/nf-modules/modules/cutadapt.nf
#!/usr/bin/env nextflow

process CUTADAPT {

    container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
        'https://depot.galaxyproject.org/singularity/cutadapt:3.4--py39h38f01e4_1' :
        'biocontainers/cutadapt:3.4--py39h38f01e4_1' }"

    input:
    tuple val(meta), path(reads)

    output:
    tuple val(meta), path('*.trim.fastq.gz'), emit: reads
    tuple val(meta), path('*.log')          , emit: log
    path "versions.yml"                     , emit: versions

    when:
    task.ext.when == null || task.ext.when

    script:
    def args = task.ext.args ?: ''
    def prefix = meta
    def trimmed  = params.single_end ? "-o ${prefix}.trim.fastq.gz" : "-o ${prefix}_1.trim.fastq.gz -p ${prefix}_2.trim.fastq.gz"
    """
    cutadapt \\
        $args \\
        $trimmed \\
        $reads \\
        > ${prefix}.cutadapt.log
    cat <<-END_VERSIONS > versions.yml
    "${task.process}":
        cutadapt: \$(cutadapt --version)
    END_VERSIONS
    """

    stub:
    def prefix  = task.ext.prefix ?: meta
    def trimmed = params.single_end ? "${prefix}.trim.fastq.gz" : "${prefix}_1.trim.fastq.gz ${prefix}_2.trim.fastq.gz"
    """
    touch ${prefix}.cutadapt.log
    touch ${trimmed}

    cat <<-END_VERSIONS > versions.yml
    "${task.process}":
        cutadapt: \$(cutadapt --version)
    END_VERSIONS
    """
}

3.4.1 Creating/ Installing modules using nf-core tools

  • Modules can be directly installed using the nf-core modules install command if module is already present in the nf-core. Note that a nf-core module can only be installed in a standard nf-core pipeline. The new module will be created in the modules/nf-core directory.

    Example:

    nf-core modules install bowtie2/bowtie2_align
  • If module is not already there then it can be created manually or a template can be created using the nf-core modules create command.

    Example:

    nf-core create modules tximport

    Output:

    bash
    tree nf-core-testpipeline/modules/
    nf-core-testpipeline/modules/ [error opening dir]
    
    0 directories, 0 files
  • A list of existing modules in nf-core can be got by nf-core modules list remote command.

    Example:

    nf-core modules list remote

3.4.2 Importing Modules

  • Components defined in the module script can be imported into other Nextflow scripts using the include statement.

  • This allows to store these components in a separate file(s) so that they can be re-used in multiple workflows.

    Example:

    include { BOWTIE2_BUILD} from '/home/diya/nf-modules/modules/bowtie2_build'

3.4.3 Module aliases

  • When including module component it is possible to specify a name alias using the as declaration.

  • This allows the inclusion and the invocation of the same component multiple times using different names

    Example:

    include { BOWTIE2_ALIGN as BOWTIE2_ALIGN_CUTADAPT } from '/home/diya/nf-modules/modules/bowtie2_align'
    include { BOWTIE2_ALIGN as BOWTIE2_ALIGN_TRIMMOMATIC } from '/home/diya/nf-modules/modules/bowtie2_align'

3.5 Subworkflows

  • Two or more module scripts can be grouped into a subworkflow script.
  • Generally the modules performing related functions are grouped together.
  • The subworkflows can be imported in the workflow using include statement.

3.5.1 Creating/ Installing subworkflows using nf-core tools

  • Subworkflows can be directly installed using the nf-core subworkflows install command if subworkflow is already present in the nf-core. Note that a nf-core subworkflow can only be installed in a standard nf-core pipeline. The new subworkflow will be created in the subworkflows/nf-core directory.

    Example:

    nf-core subworkflows install bam_markduplicates_samtools
  • If subworkflow is not already there then it can be created manually or a template can be created using the nf-core subworkflows create command.

    Example:

    nf-core subworkflows create bowtie2*

    Output:

    bash
    tree nf-core-testpipeline/subworkflows
    nf-core-testpipeline/subworkflows [error opening dir]
    
    0 directories, 0 files
  • A list of existing subworkflows in nf-core can be got by nf-core subworkflows list remote command.

    Example:

    nf-core subworkflows list remote

3.5.2 Subworkflow Structure

The syntax is defined as follows:

include { < NAME > } from < PATH >
<Importing the modules to be called in subworkflow>

workflow < NAME > {
  take:
  < Declaring workflow inputs >
  main:
  < process are called an inputs are passed as arguments >
  emit:
  < Workflow output to be emitted >
}
Example:
bash
cat bin/nf-modules/subworkflows/bowtie2.nf
#!/usr/bin/env nextflow

include { BOWTIE2_BUILD    } from '/home/diya/nf-modules/modules/bowtie2_build'
include { BOWTIE2_ALIGN as BOWTIE2_ALIGN_CUTADAPT } from '/home/diya/nf-modules/modules/bowtie2_align'
include { BOWTIE2_ALIGN as BOWTIE2_ALIGN_TRIMMOMATIC } from '/home/diya/nf-modules/modules/bowtie2_align'

workflow BOWTIE2 {
    take:
    fasta_ch // channel: [ path(fasta) ]
    cutadapt_reads
    trimmomatic_reads

    main:
    ch_versions = Channel.empty()

    BOWTIE2_BUILD(fasta_ch)
    ch_versions = ch_versions.mix(BOWTIE2_BUILD.out.versions)

    BOWTIE2_ALIGN_CUTADAPT(cutadapt_reads,BOWTIE2_BUILD.out.index,false,false)
    ch_versions = ch_versions.mix(BOWTIE2_ALIGN_CUTADAPT.out.versions)

    BOWTIE2_ALIGN_TRIMMOMATIC(trimmomatic_reads,BOWTIE2_BUILD.out.index,false,false)
    ch_versions = ch_versions.mix(BOWTIE2_ALIGN_TRIMMOMATIC.out.versions)

    emit:
    indexes = BOWTIE2_BUILD.out.index
    cutdapt_aligned= BOWTIE2_ALIGN_CUTADAPT.out.aligned
    cutadapt_log= BOWTIE2_ALIGN_CUTADAPT.out.log
    trimmomatic_aligned= BOWTIE2_ALIGN_TRIMMOMATIC.out.aligned
    trimmomatic_log= BOWTIE2_ALIGN_TRIMMOMATIC.out.log

    versions = ch_versions                    
}

3.6 Workflows

All the modules and subworkflows are called here and the input parameters are passed.

3.6.1 Workflow Structure

  • Importing the modules and subworkflows using include

  • Declare input channels and parameters if required

  • Calling processes and subworkflows and passing the input parameters

Example:
bash
cat bin/nf-modules/workflows/trimalign.nf
#!/usr/bin/env nextflow

nextflow.enable.dsl=2

// CHANNEL
sample_ch = Channel.fromFilePairs(params.input, checkIfExists: true)
fasta_ch = Channel.fromPath(params.fasta, checkIfExists: true)


// Include modules 

include { CUTADAPT }    from "/home/diya/nf-modules/modules/cutadapt"       
include { TRIMMOMATIC } from "/home/diya/nf-modules/modules/trimmomatic" 


// Include subworkflow

include { BOWTIE2 }    from "/home/diya/nf-modules/subworkflows/bowtie2"


// WORKFLOW
workflow TRIMALIGN {

    CUTADAPT (sample_ch)
    TRIMMOMATIC (sample_ch)
    BOWTIE2 (fasta_ch,CUTADAPT.out.reads,TRIMMOMATIC.out.trimmed_reads)

}

workflow.onComplete{
    
    println "SUCCESSFUL"

}
   

3.7 Main script

Main.nf is typically the main script which is executed using the nextflow run command to execute the whole pipeline. The workflow is imported and invoked here. Thus, when the main script is executed the workflow is invoked which in turn invokes the modules and subworkflows and as a result the process are executed.

Example:

bash
cat bin/nf-modules/main.nf
#!/usr/bin/env nextflow

nextflow.enable.dsl=2

// Include workflow
include { TRIMALIGN } from "/home/diya/nf-modules/workflows/trimalign"


workflow {
    TRIMALIGN()
}

3.8 Nextflow configuration

A key Nextflow feature is the ability to decouple the workflow implementation by the configuration setting required by the underlying execution platform. This enables portable deployment without the need to modify the application code.

3.8.1 Nextflow.config file

  • When a workflow script is launched, Nextflow looks for a file named nextflow.config in the current directory and in the script base directory (if it is not the same as the current directory). Finally, it checks for the file: $HOME/.nextflow/config.
  • When more than one nextflow.config file exists, they are merged, so that the settings in the first override the same settings that may appear in the second, and so on.

3.8.2 Use of config file

  • The config file is used to pass values to different variables or params. The values can be some input files path, output directory path, some variables specifying the directives.

  • Configuration properties can be used as variables in the configuration file itself, by using the usual $propertyName or ${expression} syntax.

  • The scope params allows the definition of workflow parameters that override the values defined in the main workflow script.

  • It can be also used to specify the mode of execution of the pipeline i.e whether using docker, conda , singularity, etc.

Example:

bash
cat bin/nf-modules/nextflow.config
docker.enabled = true
docker.registry = "quay.io"

params {
    input = "/home/diya/nf-training/chrX_data/test/*_{1,2}.fastq.gz"
    fasta = "/home/diya/nf-training/chrX_data/genome/chrX.fa"
    single_end                 = false
    trimmomatic_illuminaclip   = 'NexteraPE-PE.fa:2:30:10:8:TRUE'
    trimmomatic_sliding_window = '4:20'
    trimmomatic_min_length     = 50

}

4 Exercise

4.1 Zifo-Nextflow

Document: Example SOW for developing a Nextflow computational pipeline Version: 0.1 Under this Statement of Work the Supplier shall deliver the following Services and Deliverables:

4.1.1 Scope of Work

Zifo is supporting the client by producing a computational pipeline for transcript-level expression analysis of RNA-seq experiments.

4.1.2 Description of Services and Deliverables:-

Zifo to perform and support the following activities:

  • Develop a Nextflow pipeline which satisfies the following requirements:-
    • Create a nextflow pipeline template.
    • Use a samplesheet to pass the input raw sample files.
    • Following the nf-core folder structure, develop nextflow scripts to automate the protocol as described in the Nature paper titled “Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown”.
    • The paper and materials are provided by the client in the data/Nextflow_SOW directory.
  • The pipeline must adhere to the following standards:-
    • Use the latest version of language specification (i.e., DSL2)
    • Use Docker containers to install and execute software.
    • Use sub-workflows for each major task. (e.g., alignment, quantification)
    • Run to completion without warning or error.
    • Maintain the nf-core standards.